Original-Transcribed Text Alignment for Manyosyu Written by Old Japanese Language
نویسندگان
چکیده
We are constructing an annotated diachronic corpora of the Japanese language. In part of this work, we construct a corpus of Man’yōsyū, which is an old Japanese poetry anthology. In this paper, we describe how to align the transcribed text and its original text semiautomatically to be able to cross-reference them in our Man’yōsyū corpus. Although we align the original characters to the transcribed words manually, we preliminarily align the transcribed and original characters by using an unsupervised automatic alignment technique of statistical machine translation to alleviate the work. We found that automatic alignment achieves an F1-measure of 0.83; thus, each poem has 1–2 alignment errors. However, finding these errors and modifying them are less workintensive and more efficient than fully manual annotation. The alignment probabilities can be utilized in this modification. Moreover, we found that we can locate the uncertain transcriptions in our corpus and compare them to other transcriptions, by using the alignment probabilities.
منابع مشابه
Can Word Segmentation be Considered Harmful for Statistical Machine Translation Tasks between Japanese and Chinese?
Unlike most Western languages, there are no typographic boundaries between words in written Japanese and Chinese. Word segmentation is thus normally adopted as an initial step in most natural language processing tasks for these Asian languages. Although word segmentation techniques have improved greatly both theoretically and practically, there still remains some problems to be tackled. In this...
متن کاملTowards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment
In this paper we tackle the task of bootstrapping an Automatic Speech Recognition system without an a priori given language model, a pronunciation dictionary, or transcribed speech data for the target language Slovene – only untranscribed speech and translations to other resource-rich source languages of what was said are available. Therefore, our approach is highly relevant for under-resourced...
متن کاملThe Effect of Post-text Written Corrective Feedback on Written Grammatical Accuracy: Iranian intermediate EFL learners
The main role and responsibility of second language writing teachers is to help learners to write with minimal errors. To do so, teachers need to provide students with appropriate types of feedback. In this research, the researchers examined the effect of post-text written corrective feedback on written grammatical accuracy of Iranian intermediate EFL learners. In the first phase, Nelson Profic...
متن کاملThe Effect of Post-text Written Corrective Feedback on Written Grammatical Accuracy: Iranian intermediate EFL learners
The main role and responsibility of second language writing teachers is to help learners to write with minimal errors. To do so, teachers need to provide students with appropriate types of feedback. In this research, the researchers examined the effect of post-text written corrective feedback on written grammatical accuracy of Iranian intermediate EFL learners. In the first phase, Nelson Profic...
متن کاملSentence-Style Conversion of Japanese News Article for Text-to-Speech Application
This paper proposes a method of sentence-style conversion for generating spontaneous Japanese speech in a text-to-speech synthesis system. Since written language is different from spoken language linguistically, the speech by direct reading of written Japanese texts might be unnatural. The method takes a fully rulebased approach to convert the sentence style and to complement sentences, which a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016